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Abstract 

Experimental  psychologists  often  use  multifactor  repeated-measure  designs 
in  which  interactions  are  the  most  important  effects  to  be  assessed.  An 
experimenter  has  at  least  five  ways  to  evaluate  such  interactions:  (1) 
a  univariate  repeated-measures  analysis  of  variance  (ANOVA),  with  (probably) 
inflated  estimates  of  the  degrees  of  freedom;  (2)  a  univariate  repeated- 
measures  ANOVA  with  the  Greenhouse-Geisser  conservative  estimate  of  the 
degrees  of  freedom;  (3)  the  Greenhouse-Geisser  stepwise  analysis;  (4) 
a  multivariate  ANOVA;  and  (5)  specific  interaction  contrasts.  We  show 
that  no  matter  which  of  the  above  paths  is  chosen,  the  careful  experi¬ 
menter  must  compute  specific  interaction  contrasts  (i.e.,  t_-tests).  A 
worked  example  is  given. 
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Specific  Interaction  Contrasts: 

A  Statistical  Tool  for  Repeated-Measures  Designs 

R.  A.  Fisher  (Fisher  A  KacKenjie,  1923)  used  the  concepts  of  inter¬ 
action  and  additivity  when  he  invented  the  Analysis  of  Variance  (ANOVA) 
method  for  dealing  with  mul ti factor  experiments.  Until  recently,  these 
concepts  did  not  play  a  primary  role  in  any  theoretical  models  commonly 
used  in  experimental  psychology.  However,  in  1969,  Sternberg  incor¬ 
porated  both  concepts  directly  into  his  additive-factor  method  for 
decomposing  reaction  tine  (RT)  into  processing  stages. 

Assuming  that  RT  is  composed  of  a  number  of  additive  stages  in  a 
known  order,  Sternberg  proposed  that  each  stage  be  studied  by  influencing 
its  duration  with  various  treatments.  Two  treatments  which  influence  one 
or  more  stages  in  connon  should  have  an  interaction  effect  on  RT.  But 
if  the  two  treatments  influence  different  stages,  then  each  should  have 
an  additive  effect  on  RT. 

Taylor  (1976)  recently  extended  this  methodology  to  conditions  in 
which  some  dependence  ray  occur  between  processing  stages.  His  primary 
restrictive  assumption  is  that  stage  dependence  must  be  expressable  as  a 
linear  function  of  the  stage  times  involved.  Under  these  conditions, 
Sternberg's  hypothesis  concerning  additivity  does  not  hold.  That  is, 
the  absence  of  a  significant  interaction  between  the  effects  of  two 
variables  does  not  imply  that  these  variables  influence  different 
processing  stages.  However,  two  treatments  Influencing  one  or  more 
processing  stages  in  coemon  still  should  have  an  Interaction  effect  on 
RT.  Thus,  the  important  question  becomes:  How  do  we  test  for  interactions? 
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The  traditional  significance  test  of  interactions  is  the  F-ratio 
in  the  analysis  of  variance  (ANOVA).  We  find  two  major  difficulties  in 
using  this  test  with  the  form  of  experimentation  advocated  by  Sternberg 
(1969)  and  Taylor  (1976).  First,  multifactor  repeated-measure  designs, 
such  as  those  required  by  Sternberg  (1969)  and  Taylor  (1976),  do  not  meet 
a  critical  assumption  of  the  ANOVA- - independent  scores.  Second,  the 
F-ratio  is  a  vague  test,  telling  the  experimenter  almost  nothing. 

R.  A.  Fisher's  first  use  (1923)  of  the  ANOVA  was  to  study  the  effect 
of  treatments  on  plots  of  ground.  His  most  important  assumption  was  that 
the  criterion  score  (yield)  for  any  particular  plot  was  independent  of 
the  score  for  any  other  plot.  The  F-ratio,  consequently,  has  some 
problems  in  its  application  to  a  repeated-measures  experiment.  For 
example,  Sternberg  (1969)  and  Taylor  (1976)  advocate  exposing  a  subject 
to  all  possible  conditions  in  a  multifactor  experiment.  Even  if  we 
assume  that  the  carryover  effects  of  every  treatment  on  all  subsequent 
treatments  are  negligible,  we  must  deal  with  the  fact  that  any  two  scores 
measured  on  the  same  organism,  can  have  (and  usually  do  have)  non-zero 
correlations.  This  makes  the  usual  univariate  repeated-measure  F-ratio 
misleading  or  uninterpretable  (e.g.,  see  Lana  &  Lubin,  1963  and  the 
justification  section  of  the  present  paper). 

An  even  greater  difficulty  is  that  the  calculation  of  F-ratios  in 
conjunction  with  a  completely  general  ( i . e . ,  unspecified)  ANOVA  model 
tells  the  experimenter  almost  nothing  about  a  hypothesized  interaction 
effect.  A  significant  F-ratio  tells  us  nothing  about  the  direction, 
amount,  or  location  of  the  underlying  interaction;  while  a  nonsignifi¬ 
cant  F-test  may  simply  lack  the  power  to  detect  a  single  large  inter- 
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action  effect.  J.  A.  Nelder  (see  Plackett,  1960,  p.  213)  has  criticized 
such  general  ANOVA  models  on  the  grounds  of  vagueness.  He  suggested, 
instead,  that  the  model  be  specialized  to  fit  the  particular  application, 
thus  gaining  power  and  supplying  more  information  to  the  experimenter. 

Professor  0.  A.  Barnard,  also  consenting  on  the  paper  by  Plackett 
(1960),  noted  that  the  ANOVA  essentially  reduces  to  a  set  of  independent 

J 

contrasts'  and  that  we  are  free  to  select  groups  of  contrasts  in  any 
manner  we  choose  (see  Plackett,  1960,  p.  215).  Since  efficient  use  of 
the  ANOVA  involves  selecting  a  specific  model  to  fit  the  chosen  appli¬ 
cation,  why  not  devise  specific  contrasts  to  test  one’s  hypotheses?  As 
Geisser  has  said:  ’’When  there  are  contrasts  of  scientific  importance  the 
omnibus  F  is  irrelevant"  (personal  communication).  Contrast  weights 
specify  the  size,  sign,  and  location  of  the  putative  effects.  A  contrast 
can  be  tested  by  a  t-ratio.  In  this  way  we  avoid  the  vagueness  and  lack 
of  power  of  the  F-ratio.  In  particular,  contrasts  do  not  require  the 
many  assumptions  necessary  for  a  repeated-measure  ANOVA. 

Sternberg  (1969),  in  fact,  recommended  that  interactions  be 
evaluated  with  specific  interaction  contrasts.  Unfortunately,  he  provided 
no  detailed  worked  examples  of  his  procedure.  Furthermore,  the  inter¬ 
action  in  his  one  example,  in  which  computation  Is  briefly  discussed, 
has  only  one  possible  degree  of  freedom,  so  it  does  not  reveal  the  true 
potential  of  the  procedure.  We  have  seen  no  other  use  of  this  technique, 
so  we  can  only  assume  that  Sternberg's  description  was  Insufficient  for 
most  of  his  readers.  The  purpose  of  this  paper,  therefore.  Is  to  detail 
the  computational  procedures  Involved  and  give  some  theoretical  justifi¬ 
cation  for  the  use  of  contrasts. 
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Specific  Interaction  Contrasts:  Computation 
A  specific  interaction  contrast  is  an  estimate  of  the  variation  due 
to  interaction  based  on  an  explicit  set  of  contrast  weights.  In  a  multi- 
factor,  repeated-measures  design,  the  contrast  procedure  requires  that 
one  first  obtain  a  set  of  contrast  weights.  These  may  be  obtained  by 
prior  reasoning  or  from  prior  empirical  knowledge,  fach  of  the  £  conditions 
in  the  interaction  must  have  a  contrast  weight,  and  the  sum  of  these  weights 
will  be  :ero,  by  definition.  The  next  step  involves  calculating  the 
residuals  from  additivity  (i.e.,  the  interaction  effect)  for  each 


subject,  obtained  by  subtracting  the  overall  mean  and  the  main  effects 
from  a  subject's  score  for  each  of  the  p  conditions.  Again,  by  definition, 
the  residual  from  additivity  should  be  a  set  of  numbers  which  sum  to  zero. 
The  contrast  weights  are  then  applied  to  the  interaction  effect  of  each 
subject  to  obtain  a  contrast  score.  Given  n  subjects,  the  set  of  n 
contrast  scores  can  be  used  in  a  routine  t-test  of  the  null  hypothesis, 

H^,  that  the  interaction  effect  is  zero.  If  is  accepted,  then 
additivity  of  the  ~ain  effects  is  implied  only  for  this  particular 
weighted  combination  of  treatment  levels.  Generally,  the  maximum  number 
of  independent,  specific  interaction  effects  is  limited  by  the  degrees 
of  freedom  for  the  interaction  term  in  the  ANOVA  model. 

Let  be  the  contrast  score  for  the  kth  subject.  Then 


h  ‘  { j  “u  4ijk 


i  *  1  to  r 

J  ■  1  to  c  (1) 


where  is  the  contrast  weight  for  the  ijth  cell,  and  is  the 

residual  from  additivity  for  the  ijth  cell  within  the  kth  subject. 


First,  we  will  give  the  standard  method  of  computing  the  residual 
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from  additivity.  Then  we  will  discuss  the  more  difficult  problem  of 
estimating  the  contrast  weights, 

Let  us  look  at  an  example  (Tharp,  1975)  in  which  the  time  required 
to  name  pictures  is  measured  under  alcohol  and  baseline  conditions. 

The  designated  correct  names  to  the  stimuli  came  from  five  different 
word- frequency  (WF)  categories.  The  deleterious  effect  of  alcohol  on 
verbal  reaction  time  is  expected  to  increase  as  word-f requency  decreases-- 
an  interaction  effect. 

In  this  example"  there  are  five  levels  of  Wf  and  two  levels  of 
drug,  giving  us  (5-1)  (2-1)  *  4  degrees  of  freedom  for  the  interaction. 
Thus,  there  are  four  possible  independent  specific  interaction  effects. 
Table  1  illustrates  sunnary  data  for  one  subject.  There  are  ten  scores 
per  subject. 


Table  1  about  here 


Residuals  from  Additivity 

Let  be  the  average  score  in  row  i_  and  column  j_.  First,  the 
effect  of  the  Uh  row,  R . ,  is  equal  to: 

J  “1J  -  J  -  1.2  (2) 

2 

where  X..  is  the  grand  mean.  Second,  the  effect  of  the  jth  column, 
Cj,  is  equal  to: 

Cj  «  T  -  X..  1  «  1  to  5  (3) 


5 


Specific  Interaction  Contrasts 

7 


Finally,  the  residual  from  additivity  q.^,  in  row  i  and  column  j  is 
equal  to: 


X. . 


(4) 


Note  that  R.  and  C.  are  row  and  column  effects,  respectively,  not 

*  J 

means.  For  example,  for  the  score  in  row  1  and  column  1  of  Table  1, 
the  row  effect  (Rj )  is  equal  to  the  row  mean  minus  the  grand  mean 
(i.e.,  859.5  -  1035.3  =  -175.8).  The  column  effect  (C ^ )  is  equal  to 
the  column  mean  minus  the  grand  mean  (i.e.,  1124.8  -  1035.3  *  89.5). 
Finally,  to  obtain  the  residual  from  additivity  (q^),  the  row  effect, 
the  column  effect,  and  the  grand  mean  are  all  subtracted  from  the  cell 
score  (i.e.,  901  ♦  175.8  -  89.5  -  1035.3  *  48.0).  All  the  residuals 
from  additivity,  computed  from  Table  1,  are  shown  in  Table  2. 


Table  2  about  here 


Each  residual  is  equal  to  a  random  error  term  (e.^),  plus  a  putative 
interaction  effect, 

Deterri nation  of  Contrast  Weights 

Assume  that  the  true  interaction  effect  in  the  ijth  cell  is 
Then  the  best  estimate  of  the  contrast  weight,  is  If  the 

set  of  ten  contrast  weights  is  equal  to  (or  proportional  to)  the  set 
of  ten  interaction  effects,  then  the  contrast  score  of  equation  1 
Is  maximized. 

Theoretical  Weights.  To  the  extent  that  an  experiment  is  based  on 
experience,  sound  judgment  and  prior  scientific  knowledge,  one  should 
have  little  difficulty  predicting  the  amount  and  direction  of  any 
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hypothesized  interactions.  In  our  example,  the  effect  of  alcohol  was 
expected  to  remain  relatively  constant  from  WF^  to  WF^  and  to  increase 
dramatical ly  at  WF^  and  WF^.  These  results  were  expected  because  (1) 
an  exponential  increase  in  a  priori  stimulus-response  uncertainty  occurs 
from  WFj  to  WF.  and  ( 2 )  the  effect  of  alcohol  increases  as  a  function  of 
experimentally  defined  stimulus-response  uncertainty  (Tharp,  Rundell, 
Lester,  S  Williams,  1974).  Thus,  based  on  prior  knowledge,  we  might 
postulate  the  set  of  weights,  shown  in  Table  3. 


Table  3  about  here 


There  are  two  restrictions  on  these  interaction  contrast  weights.  First, 
(r  w1j  *  0  for  every  row  j  *  1,2  (5) 

J 


and  second 

v  wij  *  0  for  every  column  i  *  1  to  5  (6) 

One  can  adjust  any  set  of  weights  to  fit  these  rules  by  subtracting 
from  each  weight  the  appropriate  row  mean,  finding  column  means  of  the 
row-adjusted  scores,  and  then  subtracting  the  appropriate  column  mean 

J 

from  each  cel  1 . 

Contrast  weights  are  usually  given  as  single  digits  lying  between 
-9  and  *9.  Single  digit  weights  might  be  obtained  by  smoothing  and 
dividing  all  scores  by  their  lowest  comon  denominator.  Two-digit 
accuracy  might  be  justified  with  50  or  more  subjects. 

Cross-Validation  Procedure.  If  one  cannot  ask  questions  about 
interaction  effects  in  terms  of  prior  contrast  weights,  then  empirical 
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post-hoc  contrast  weights  can  be  estimated  from  the  cell  means  for  half 
the  experimental  subjects  (analysis  group)  by  setting  .  3  q. .. 

The  remaining  subjects  (cross-val idation  group)  then  can  be  used  to  get 
an  unbiased  estimate  of  each  putative  interaction  effect  and  to  test 
the  obtained  effect  for  significance.  This  cross-validation  method  is 
standard  operating  procedure  for  psychometricians  in  multiple  regression, 
test  construction,  etc.,  (Mosier,  1951).  Cross -val idat ion  is  rarely 
used  as  such  by  statisticians.  However,  the  "jack-knife  method", 
popularized  by  Tukey,  as  well  as  the  Geisser  "predictive  sample  reuse 
method",  can  be  viewed  as  a  generalization  of  the  usual  two-group  cross- 
validation  procedure  (see  Mosteller  &  Tukey,  1968;  Geisser,  1975). 

Our  example  consisted  of  24  subjects,  so  12  of  them  (i.e.,  the 
analysis  or  training  group)  are  used  to  estimate  the  weights  while  the 
remaining  12  are  the  cross-validation  (testing)  group.  Table  4  gives 
the  set  of  weights  derived  from  the  analysis  group. 


Table  4  about  here 


Notice  that  these  values  are  quite  similar  to  our  theoretical  values  in 
that  most  of  the  alcohol  effect  occurs  at  WF^. 

Significance  Tests 

The  significance  test  will  have  maximum  power  when  the  interaction 
effect,  t ^ j »  for  each  cell  is  equal  to,  or  proportional  to,  the  contrast 
weight  for  that  cell.  For  simplicity  in  what  follows,  we  omit  the  case 
of  proportionality.  Thus,  the  experimenter’s  hypothesis  is: 

H,:  itJ  ■  -(J. 


(7) 
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and  the  null  hypothesis  is: 

V  ’u  ' 0  <8> 

Both  hypotheses  assume  that  equals  the  expected  value  of  q.^,  so 
is  equivalent  to  q.  ^  3  0,  where 

_ _ 

n 

* 

and  n  refers  to  the  number  of  subjects. v  The  contrast  score  for  the 
kth  subject  is  defined  by  equation  1.  The  expected  value  of  Y,  computed 
over  the  n  subjects,  will  equal  zero  if  H  is  true.  If  Hj  is  true,  then 


the  expected  value  is: 


We  now  have  a  between-subjects  t-test,  eliminating  the  repeated-measures 
problems . 

Theoretically,  under  Hj  the  Yfc  scores  for  the  cross-validation 
group  will  range  from  zero  to  7  7  Under  H^,  the  Y^  scores  can 

be  positive  or  negative,  centering  on  zero.  The  conventional  t-test 


is  simple  to  obtain. 


t  -  Y 


where  H  is  E(T)  =  0 

— o  ~ 

H1  is  E ( Y )  >  0 

and  the  t-ratio  has  (n-1)  degrees  of  freedom. 
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To  complete  oar  example.  Table  5  Illustrates  the  scores  for  the 
cross-val idation  group  using  the  weights  obtained  from  the  analysis 
group  (right)  or  the  theoretical  weights  (left). 


Table  5  about  here 


For  Sternberg  (1969)  additive  effects  between  two  treatments  are  as 
meaningful  as  interactive  effects.  Thus,  he  suggests  that 
...  one  might  present  findings  in  terms  of 
roan  interaction  contrasts  of  theoretical ly 
interesting  rygn  i  tudes ,  and  adjust  tests  so 
that  errors  of  Types  1  and  2  have  equal 
probabilities  with  respect  to  such  alterna¬ 
tives.  (p.  310) 

One  could,  for  exa  pie.  use  the  appropriate  t-ratio  at  the  SO*  level  as 
a  rejection  point.  This  adjustment  cannot  be  used  to  infer  additivity, 
however,  within  the  context  of  the  test  we  recomend  for  interactions. 
That  is,  rejection  of  a  specific  interaction  hypothesis  does  not  imply 
additivity.  In  view  of  Taylor’s  (1976)  cautions  against  interpreting 
additivity,  this  inability  to  infer  an  additive  relationship  from  a 
specific  test  does  not  appear  to  be  a  serious  drawback. 

Several  Interaction  Hypotheses 

The  major  interest  of  the  study  may  not  be  the  comparison  of 
with  Hj .  In  some  cases,  divergent  theories  might  lead  to  divergent 
hypotheses  as  to  the  nature  and  direction  of  the  Interaction.  For 
example,  suppose  that  some  evidence  predicted  a  ’'golden-mean"  theory 
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in  which  the  optimal  effect  of  alcohol  occurs  at  the  median  word 
frequency  level,  WF^,  while  decrement  occurs  at  the  extreme  word 
frequency  levels,  WFj  and  WF^.  Thus,  the  theoretical  weights  might 
be  those  shewn  in  Table  6.  We  now  have  two  experimental  hypotheses. 


Table  6  about  here 


Hj  ai.J  .  Let  the  vector  of  weights  given  in  Table  4,  Vj,  represent 
Hj ;  and  the  vector  of  weights  given  in  Table  6,  V^,  represent  .  Each 
vector  will  yield  a  t-ratio--the  higher  the  t-ratio,  the  better  the 
hypothesis  fits  the  data.  To  make  an  accurate  estimate  of  the  signi¬ 
ficance  of  the  t-ratios,  the  rejection  levels  must  be  adjusted  to  take 
account  of  the  fact  that  two  similar  significance  tests  have  been 
obtained  fro r  the  same  data.f 

Although  there  are  many  solutions  to  this  nul tiple-comparison 
problem  (e.g.,  Miller,  1966),  we  prefer  the  Dunn-Bonferroni  method 
(Dunn,  1959).  Assume  that  you  want  to  hold  your  exper imentwi se  Type  1 
error  at  .05  (i.e.,  when  is  true  then  either  t-ratio  or  both  t-ratios 
will  be  significant  in  five  percent  of  all  comparisons).  In  the  simplest 
version  of  the  Dunn-Bonferroni,  the  chosen  alpha  level  is  divided  by 
the  numL  r  of  comparisons  to  obtain  a  critical  level  of  significance 
for  each  comparison.  Thus,  in  this  example  we  would  divide  .05  by  two 
to  obtain  .025--the  level  at  which  each  t-test  would  be  evaluated. 

This  is  a  very  conservative  test  which  guarantees  that  the  Type  1  error 
will  be  .05  or  greater,  so  it  lacks  some  of  the  power  of  other  tests. 

In  general,  it  will  be  more  conservative  (i.e.,  the  Type  1  error  will 
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be  larger)  as  the  correlation  between  Vj  and  V,,  increases.  Thus,  the 

y 

experimenter  should  be  careful  to  avoid  testing  redundant  hypotheses. 
When  the  weights  of  Vj  correspond  exactly  to  the  true  interaction 
effects,  ^ ^ .  then  the  contrast  scores  for  Vj  will  account  for  all 
interaction  variance  (i.e.,  any  vector  which  is  orthogonal  to  Vj  will 
have  a  of  zero). 

Most  experiments  contain  several  families  of  statistical  hypotheses 
The  Dunn-Sonferroni  adjustment  rjy  apply  separately  to  each  family 
(Killer,  1966).  For  example,  one  might  be  interested  in  the  effect  of 
a  new  treatment  on  various  information  processing  stages,  and  thus 
introduce  that  treatment  into  a  multi  factor  design  with  several  treat¬ 
ments  whose  locus  of  effect  has  already  been  "established"  by  means  of 
the  Sternberg-Taylor  procedure.  Such  an  experiment  would  involve  a 
two-step  analysis.  Step  one,  constituting  one  family  of  statistical 
tests,  would  involve  confirming  the  interactions  between  the  established 
treatments  in  data  which  did  not  include  the  new  treatment.  The  second 
step,  constituting  the  second  family,  would  be  a  search  for  interactions 
between  the  effects  of  the  new  treatment  and  the  established  ones. 

Just  if i cat  ion 

Let  us  review  the  remaining  four  ways  to  evaluate  interactions  in 
a  repeated  measures  design:  (1)  the  univariate  repeated-measures 
AN0VA  using  the  (probably)  inflated  estimate  of  degrees  of  freedom; 

(2)  multivariate  ANOVA:  (3)  conservative  degrees  of  freedom  with  a  uni¬ 
variate  ANOVA:  and  (4)  the  Greenhouse-Geisser  stepwise  analysis.  We 
will  demonstrate  that  no  matter  what  the  outcome  of  the  given  procedures 
the  careful  experimenter  must  use  specific  interaction  contrasts. 
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Univariate  Repea  ted -Measures  ANOVA 

For  the  conscientious  experimenter  few  problems  are  so  exasperating 
as  the  analysis  of  a  repeated-measures  design.  In  1954,  Box  showed  that 
the  exact  analysis  of  a  repeated-measures  design  is  a  multivariate  analysis 
and  that  the  basic  answers  must  be  given  in  terms  of  mult'variate  F-ratios 
(see,  also,  Winer,  1971,  sections  4.4  and  4.9).  The  usual  univariate 
F-ratio  approach,  set  forth  in  almost  every  psychological  statistics  text, 
is  strictly  valid  only  under  a  set  of  necessary  and  sufficient  conditions 

Q 

known  variously  as  (a)  the  "circularity  property""  (Rouanet  and  lepine, 
1970);  (b)  "equality  of  variances  of  differences"  (Huynh  and  Feldt,  1970); 
or  (c)  "homogeneity  of  interaction  variances"  (McNemar,  1962,  pp.  315-316). 

All  F-ratios  with  only  one  degree  of  freedom  for  the  numerator  are 
valid  under  the  standard  multivariate  normal  assumptions  (see  Appendix). 

With  three  or  more  measures  per  subject,  one  could  test  for  the  "circularity 
property"  and  drop  the  univariate  approach  when  it  does  not  hold.  When 
the  circularity  property  does  hold,  one  need  only  worry  about  interpreting 
the  meaning  of  a  significant  F-ratio. 

Multivariate  Analysis  of  Variance 

One  possible  solution  for  analyzing  repeated -measures  data  is  to 
avoid  the  univariate  approach--do  a  multivariate  ANOVA  as  soon  as  we 
have  three  or  more  measures  per  subject.  Unfortunately,  the  multivariate 
ANOVA  presents  many  problems  unless  one  has  a  large  number  of  subjects. 

Given  re  scores  per  subject,  the  number  of  subjects  must  be  equal 
to  or  greater  than  (rc-1)  in  order  to  compute  a  multivariate  ANOVA. 

This  absolute  bar  is  present  because  every  multivariate  ANOVA  demands  the 
computation  of  the  inverse  of  a  within-group  covariance  matrix.  When 
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n+1  is  less  than  the  number  of  measures  per  subject,  the  determinant 
of  the  covariance  matrix  must  be  zero,  and  the  inverse  does  not  exist. 

Most  experimenters  probably  can  slip  under  the  ( rc - 1 )  barrier  with 
one  or  two  degrees  of  freedom  to  spare.  If  so,  then  we  usually  run  into 
the  problems  of  the  ill-conditioned  covariance  matrix  and  the  inherently 
large  sampling  variance  of  correlation  coefficients  and  variances.  The 
idea  of  estimating  the  inverse  of  a  4x4  or  5x5  covariance  matrix,  which 
is  based  on  less  than  7 0  degrees  of  freedom,  only  appeals  to  those  who 
have  an  overwhelming  faith  in  small  samples. 

The  inherent  sampling  instability  of  second-order  statistics  (e.g., 
correlations ,  variances,  etc.)  will  interact  with  small  sample  size  to 
emphasize  the  problems  of  the  mul ti variate-normal  model.  The  unbiased 
estimate  of  a  covariance  matrix  demands  many  more  assumptions  than  the 
unbiased  estimate  of  a  difference  between  two  independent  means.  Some 
of  these  assumptions  arc  given  in  the  Appendix.  Ordinarily,  even  when 
we  do  not  have  exact  normality  or  homogeneous  variances,  the  Central 
Limit  Theorem  guarantees  that  the  difference  between  a  pair  of  indepen¬ 
dent  means  goes  very  quickly  towards  a  normal  distribution  with  homog¬ 
eneous  variance,  provided  that  scores  are  independent  and  that  there 
are  enough  of  then.  But  second-order  statistics  are  very  sensitive  to 
even  slight  deviations  from  normality  (e.g.,  the  fourth  moment),  and 
the  Central  Limit  Theorem  has  very  little  effect  with  small  samples. 
Furthermore,  if  some  fundamental  assumption  such  as  linearity  or  indepen¬ 
dence  has  been  violated,  then  the  Central  Limit  Theorem  simply  does  not 
apply,  for  example,  if  one  measure  has  a  non -mono tonic  relation  to 
another,  then  no  increase  in  the  sample  size  will  linearize  that  relation. 
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In  summary ,  then,  multivariate  analysis  of  variance  is  an  exact  way 
to  evaluate  repeated-measure  interactions.  Nevertheless,  a  rxiltivariate 
ANOVA  is  practical  only  (a)  if  one  has  fifty  cases  or  more  (given  that  rc 
is  more  than  5)  and  (b)  if  the  important  assumptions  of  linearity,  a  well- 
conditioned  covariance  matrix,  etc.,  hold  (see  Appendix). 

Conservative  Estimate  of  the  Degrees  of  freedom 

Some  investigators  have  attempted  to  avoid  the  repeated- measures 
problems  of  the  univariate  ANOVA  by  using  t tie  conservative  test  advo¬ 
cated  by  Greenhouse  and  Geisser  (1959).  This  approach  is  based  on  a 
statistic,  epsilon,  developed  by  Box  (1954).  When  epsilon  equals  unity,' 
then  the  usual  univariate  ANOVA  of  repeated-measures  is  valid.  When 
epsilon  is  less  than  unity,  multiplying  the  degrees  of  freedom  for  the 
numerator  and  the  denominator  of  the  univariate  F-ratio  by  epsilon, 
gives  the  approximate  degrees  of  freedom  for  a  valid  evaluation  of  the 
univariate  F-ratio. 

The  Greenhouse  and  Geisser  (1959)  conservative  test  avoids  the 
estimation  of  epsilon.  It  uses  the  fact  that  epsilon  cannot  go  any 
lower  than  l/(p_-l)  where  p  is  the  number  of  measures  per  subject. 
Consequently,  by  using  this  minimum  value  of  epsilon,  one  usually 
underestimates  the  degrees  of  freedom  and  the  significance  level  of 
the  F-ratio.  If  the  conservative  test  is  significant,  then  the  multi¬ 
variate  analysis  would  also  be  significant.  But,  the  Greenhouse-Gei sser 
conservative  test  will  only  reject  the  null  hypothesis  for  rather  large 
F-ratlos.  When  the  conservative  test  is  not  significant,  the  experimenter 
is  given  very  little  information. 
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Greenhouse-Geisser  Stepwise  Analysis 

Greenhouse  and  Geisser  in  describing  their  stepwise  analysis  (1959, 
p.  110)  advised  that  one  should  test  the  univariate  F-ratio  first,  using 
the  nominal  degrees  of  freedom  without  the  epsilon  correction.  When  this 
F-ratio  is  significant,  one  must  then  perform  the  conservative  test.  The 
epsilon  correction  is  needed  only  when  the  F-ratio  with  inflated  degrees 
of  freedom  is  significant  and  the  conservative  test  is  not.  This  stepwise 
approach,  also  recon-ended  by  Lana  and  Lubin  (1963),  assumes  that  no 
further  analysis  is  needed  when  the  univariate  F-ratio  with  possibly 
inflated  degrees  of  freedom  is  not  significant.  This  assumption  is 
wrong.  Davidson  (1972)  showed  that  one  could  easily  obtain  a  significant 
Hotelling  T  (i.e.,  a  significant  multivariate  F-ratio)  on  the  same 
data.'1  Thus,  a  non-significant  F,  with  possibly  inflated  degrees  of 
freedom,  also  gives  the  experimenter  very  little  information. 

In  surrary,  if  a  significant  F  is  obtained  with  the  conservative 
test  or  by  using  epsilon  to  approximate  the  valid  degrees  of  freedom, 
then  the  only  problem  is  to  interpret  the  meaning  of  the  F-ratio.  Other¬ 
wise,  one  can  turn  to  a  multivariate  ANOVA  if  enough  subjects  are  available 
(e.g.,  50  or  so,  given  that  £  is  about  5)  and  the  important  multivariate 
assumptions  are  met. 

Specific  Interaction  Contrasts  Versus  the  Omnibus  F-Ratio 

Justification.  When  a  non-significant  F  is  obtained  either  with  the 
Greenhouse-Geisser  conservative  test  or  by  using  epsilon  to  approximate 
the  valid  degrees  of  freedom  and  the  conditions  are  not  appropriate  for 
a  multivariate  ANOVA,  only  one  of  our  suggested  solutions  remains--specific 
interaction  contrasts.  Such  a  situation  always  justifies  the  use  of 
specific  interaction  contrasts. 
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However,  the  most  powerful  argument  for  the  use  of  interaction 
contrasts  is  a  pragmatic  one.  Even  when  F-ratio  tests  (univariate  or 
multivariate)  are  appropriate,  only  two  outcomes  are  possible--a 
significant  F-ratio  or  a  nonsignif icant  F-ratio.  Both  outcomes  should 
lead  the  careful  experimenter  to  test  for  specific  interactions. 

A  nonsignificant  F-ratio  may  be  due  to  a  lack  of  power,  the  result 
of  considering  all  possible  interaction  contrasts  simultaneously.  There¬ 
fore,  a  careful  exper irenter  should  always  apply  any  prior  interaction 
contrast  (derived  from  hypothesis  or  an  analysis  group)  to  the  data  to 
see  if  the  t-test  is  significant  even  though  the  overall  F-ratio  was  not. 

If  the  F-ratio  is  significant,  then  the  experimenter  still  has  to 
determine  which  cells  yielded  the  significant  interaction  effect,  as  well 
as  the  direction  and  amount  of  each  cell  effect.  A  significant  F-ratio 
does  not  guarantee  that  the  experimenter's  hypothesis  about  the  interaction 
is  correct.  For  example,  the  cell  sizes  ray  be  as  hypothesized,  but  with 
opposite  signs;  all  signs  may  be  as  hypothesized,  but  the  amounts  nay  be 
wrong;  or  possibly  the  experimental  hypothesis  ray  be  wrong  about  both  the 
direction  and  magnitude  of  the  interaction  effects. 

In  sun-ary,  a  nonsignificant  F-ratio  tells  one  almost  nothing.  A 
significant  F-ratio  is  merely  a  hunting  permit,  with  the  Interaction 
contrast  and  its  associated  t-test  as  weapon. 

Advantages.  A  specific  interaction  contrast  is  pragmatic,  powerful, 
robust  and  easy  to  compute.  Moreover,  as  we  have  shown,  it  is  unavoid¬ 
able  when  evaluating  interactions  in  a  repeated-measures  design. 

Specific  Interaction  contrasts  allow  the  experimenter  to  test  hfs 
Interaction  hypothesis  exactly.  Since  a  specific  hypothesis  is  tested. 
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the  t-test  performed  is  one-tailed  and  always  more  powerful  than  the 
comparable  F-ratio. 

Only  three  basic  assumptions  are  requited  of  the  data  in  order  to 
test  an  interaction  contrast:  (1)  independence  of  the  n  contrast  scores, 
(2)  a  normal  distribution,  and  (3)  homogeneous  variance.  Independence 
is  guaranteed  by  independent  selection  and  scoring  of  the  n  subjects. 

The  latter  two  assurptions--normal i ty  and  homogeneous  variance--are  not 
guaranteed,  but  can  be  tested  by  using  the  Wilcoxon  signed  rank  test  in 
tandem  with  the  t-test.  As  n  increases,  normality  and  homogeneous 
variance  become  irrelevant.  Thus,  specific  interaction  contrasts,  in 
conjunction  with  appropriate  rank-order  tests,  are  robust. 
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‘A  contrast  is  a  weighted  combination  of  scores  where  the  sum  of 
the  weights  equals  zero. 

‘Since  there  are  only  two  levels  of  drug  (i.e.,  1  df),  one  can 
subtract  baseline  from  alcohol  scores  to  simplify  the  computational 
procedures  for  computing  residuals  from  additivity.  We  have  not  simpli¬ 
fied  in  the  above  example  in  order  to  show  how  such  residuals  are  com¬ 
puted  with  both  rows  and  columns.  Finding  residuals  for  the  simplified 
scores  is  a  straightforward  generalization  of  the  example  given. 

JThis  procedure  is  equivalent  to  the  method  detailed  for  finding 
residuals  from  additivity. 

**We  assume  that  the  interaction  effect  for  the  ijth  cell,  t^, 
is  a  constant  for  all  n  subjects.  Any  interaction  with  the  subjects 
is  thrown  into  the  error  deviance. 

Srtien  the  specific  interaction  hypothesis  is  correct,  under  most 
circumstances  the  contrast  weights  will  be  proportional  to  the  actual 
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interaction  effect.  For  example,  we  recommend  smoothing  the  contrast 
weights,  but  not  the  residuals  from  additivity  for  each  subject,  to 
single  digit  numbers.  Thus,  the  theoretical  maximum  value  can  be  stated 
more  accurately  as  being  k  T  V  where  k  is  the  constant  of  proportion- 

1  j  IJ 

al  ity. 

fcFach  t-ratio  may  have  a  different  between-subjects  variance  in  this 
example.  The  completely  general  ANOVA  demands  that  every  F-ratio  and 
t-ratio  for  interaction  must  have  the  sane  error  variance  to  comply  with 
the  assumption  of  homogeneous  variance. 

'The  routine  application  of  stepwise  multiple  regression  (e.g., 

Dixon,  1975)  to  the  matrix  of  vectors  representing  the  hypotheses  will 
guarantee  linear  independence  and  thus  eliminate  redundancy.  If  there 
are  h  degrees  of  freedom  for  the  interaction  deviance  in  the  ANOVA,  then 
one  can  construct  h  vectors  that  are  mutually  orthogonal  to  one  another. 

8These  "circularity  properties"  are  less  restrictive  than  the 
"compound  syrr.etry"  property  (Votaw,  1948).  Compound  symmetry  holds 
when,  given  £  repeated-measures,  the  £  variances  are  equal  and  the 
p(£-l)/2  correlations  are  identical.  Rouanet  and  Lepine  (1970)  have 
shown  that  compound  syrnetry  is  sufficient  but  not  necessary  to 
guarantee  the  validity  of  all  F-ratios  in  a  univariate  ANOVA  of  a 
repea  ted -measures  design.  To  confuse  the  issue  further,  the  Greenhouse 
and  Geisser  1959  article  has  a  slip  (p.  95):  the  word  necessary  was 
applied  to  the  compound  s/rnetry  model  rather  than  sufficient,  as  was 
implied  by  their  first  paper  (Geisser  and  Greenhouse,  1958).  This  slip 
was  copied  by  Lana  and  Lubin  (1963),  Gaito  (1961),  Winer  (1962)  and 
others.  Winer  corrected  this  slip  in  his  second  edition  (1971,  pp.  282-283). 
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9Rouanet  and  Lepine  (1970),  as  well  as  Huynh  and  Feldt  (1970), 
verified  that  when  the  circularity  property  holds,  the  Box  epsilon 
criterion  equals  unity. 

10If  some  of  the  measures  with  small  variances  have  very  high 
correlations  with  one  another,  and  if  the  remaining  measures  have 
high  variance  and  low  inter-correlations  with  all  other  measures,  then 
the  multivariate  ANOVA  will  unerringly  pick  out  the  linear  compound 
that  gives  maximum  weight  to  the  differences  between  the  means  of  the 
highly  correlated  measures.  The  univariate  ANOVA  gives  equal  weight 
to  all  differences,  and  so  may  end  up  with  a  nonsignificant  result. 
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Table  1 

Verbal  RT  in  Milliseconds,  N-l 


Alcohol 

Basel ine 

*1 

901 

818 

*1 

* 

859.5 

WF2 

932 

8S4 

*2 

w 

893 

wf3 

949 

870 

*3 

s 

909.5 

wf4 

1109 

982 

*4 

a 

1045.5 

WFS 

1733 

1205 

*5 

■ 

1469 

X.,  -  1124.8 

X.2  -  945.8 

X.. 

s 

1035.3 

Table  2 


Residuals  Frotf  Additivity  For  One  Subject 


A1 cohol 

Basel ine 

Row  Effects 

WF, 

-48.0 

48.0 

-175.8 

kT2 

-50.5 

50.5 

-142.3 

WF3 

-50.0 

50  0 

-125.8 

WF4 

-26.0 

26.0 

10.2 

*5 

174.5 

-174.5 

433.7 

Colunn 

Effects 

89.5 

-89.5 

X..  •  1035.3 
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Table  3 

A  Set  of  Theoretical  Contrast  Weights 
Alcohol  Baseline 


Wf, 


WT. 


Wf . 


WF, 


WF, 


-1 

-1 

-1 

0 

3 

0 


1 

1 

1 

0 

-  3 
0 


0 

0 

0 

0 

0 


Table  4 

Contrast  Weights  Obtained  empirically 
from  the  Analysis  Group 


WF, 


WF. 


WF. 


WF, 


WF, 


Basel ine 
-1 
-5 
1 

-4 

9 


Condition 

Alcohol 

1 

5 

-1 

4 

-9 
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Table  5 

Contrast  Scores 


Subjects 

Using 

Hypothesised  Weights 

Using 

Analysis  Group  Weights 

Y1 

648 

2,458 

Y2 

3,101 

8,293 

Y3 

2.05S 

6,547 

Y4 

4,446 

13,388 

YS 

438 

1,096 

Y6 

3,616 

9,800 

Y7 

271 

2,249 

Y8 

580 

1,888 

Y9 

-524 

-1,594 

Y10 

155 

-15 

Y11 

530 

1 ,528 

Y12 

818 

618 

Y 

1,344.500 

3,854.667 

sy 

1,574.214 

4,570.038 

t 

2.959 

2.922 

p 

.006 

.007 

2 

r 

.443 

.437 
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Table  6 


A  Golden-Mean  Set  of  Contrast  Weights 
Alcohol  Baseline 


WF 

WF 

Wf 

WF 

WF 


1 

2 

3 

4 

5 


-1 

0 

2 

0 

-1 


1 

0 

-2 

0 

1 


Appendix 

Some  Assumptions  of  Multivariate  Analysis  of  Varance 

(1)  Independence  of  Subjects,  tach  subject  was  selected  independently 
of  any  other  subject. 

(2)  Normality.  Fach  of  the  p  measures  has  a  marginal  normal  distribution. 

(3)  Homogeneity  of  Residual  Variance  [HomoscedasticUyO .  When  we 
predict  one  of  the  p  measures  from  any  linear  component  of  the 
remaining  (p-1)  measures,  all  errors  of  prediction  must  have  the 
same  variance. 

(4)  linearity.  Fach  of  the  p  neasures  has  a  linear  regression  on  any 
weighted  combination  of  the  remaining  (p-1)  measures. 

(5)  Homogeneous  Universe.  Fach  subset  of  subjects  in  the  sample  has 
the  same  covariance  matrix  as  any  other  subset. 

(6)  Well.  -conditioned  Covariance  Matrix.  The  determinant  of  the  p  by  p 
covariance  matrix  Is  clearly  greater  than  zero. 
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(u) _ Experimental  psychologists  often  use  multifactor  repeated-measure 

designs  in  which  interactions  are  the  most  important  effects  to  be  assessed. 

An  experimenter  has  at  least  five  ways  to  evaluate  such  interactions:  (1)  a 
univariate  repeated-measures  analysis  of  variance  (ANOVA),  with  (probably) 
inflated  estimates  of  the  degrees  of  freedom;  (?)  a  univariate  repeated- 
measures  ANOVA  with  the  Greenhouse-Gei sser  conservative  estimate  of  the  degree 
of  freedom;  (3)  the  Greenhouse-Gei sser  stepwise  analysis;  (4f  a  multivariate 
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20.  (Continued) 

ANOVA;  and  (b )  specific  interaction  contrasts.  We  show  that  no  matter  which 
of  the  above  paths  is  chosen,  the  careful  experimenter  must  compute  specific 
interaction  contrasts  (i.e.,  t-tests).  A  worked  example  is  given. 
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